Joint Max Margin and Semantic Features for Continuous Event Detection in Complex Scenes
نویسندگان
چکیده
In this paper the problem of complex event detection in the continuous domain (i.e. events with unknown starting and ending locations) is addressed. Existing event detection methods are limited to features that are extracted from the local spatial or spatio-temporal patches from the videos. However, this makes the model vulnerable to the events with similar concepts e.g. “Open drawer” and “Open cupboard”. In this work, in order to address the aforementioned limitations we present a novel model based on the combination of semantic and temporal features extracted from video frames. We train a max-margin classifier on top of the extracted features in an adaptive framework that is able to detect the events with unknown starting and ending locations. Our model is based on the Bidirectional Region Neural Network and large margin Structural Output SVM. The generality of our model allows it to be simply applied to different labeled and unlabeled datasets. We finally test our algorithm on three challenging datasets, “UCF 101-Action Recognition”, “MPII Cooking Activities” and “Hollywood”, and we report state-of-the-art performance. c © 2017 Elsevier Ltd. All rights reserved.
منابع مشابه
Scene Aligned Pooling for Complex Video Recognition
Real-world videos often contain dynamic backgrounds and evolving people activities, especially for those web videos generated by users in unconstrained scenarios. This paper proposes a new visual representation, namely scene aligned pooling, for the task of event recognition in complex videos. Based on the observation that a video clip is often composed with shots of different scenes, the key i...
متن کاملLarge scale continuous visual event recognition using max-margin Hough transformation framework
In this paper we propose a novel method for continuous visual event recognition (CVER) on a large scale video dataset using max-margin Hough transformation framework. Due to high scalability, diverse real environmental state and wide scene variability direct application of action recognition/detection methods such as spatio-temporal interest point (STIP)-local feature based technique, on the wh...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملRGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter
Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D cam...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.04122 شماره
صفحات -
تاریخ انتشار 2017